Skip to content

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Aug 23, 2023

Summary:

Pull flusurv data from new CDC API endpoint. Ingest previously unlabelled age groupings. Ingest new race/ethnicity and sex breakdowns. Test flusurv.py functions. Add more comments, messaging, and assertions.

Closes #1247
Closes #242

Note:

  • Some of our CDC contacts are able to provide us with past versions of data from the API. We will want to patch those in. Many of the "new" strata are actually available back to 2009. We'll probably want to patch those in as well.
  • This also requires updates to the database table, the API server, and the API docs that will be made separately.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

- rename input arg to `update` to avoid reassignment later
- comment and reuse args_insert
- spelling
- comment magic constant used in output format
- rename location-network/catchmentid map
Previously, age strata were numbered sequentially which allowed us to
store rate values by position in a list. With the introduction of the
new strata, this system is not robust enough to track all the different
groups (e.g. ageids are no longer sequential and there are now race and
sex groupings with separate numbering systems).
@nmdefries nmdefries force-pushed the ndefries/flusurv-new-endpoint branch from 734dca9 to f8a6706 Compare September 15, 2023 15:50
@nmdefries nmdefries marked this pull request as ready for review September 15, 2023 16:25
Copy link
Collaborator

@melange396 melange396 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work! i really appreciate the thorough commenting!

youll wanna pull in the changes from the dev branch, PR #1241 added tests for the flusurv endpoint.

@aysim319
Copy link
Contributor

ran the code locally and fails with

found data for 82 epiweeks
[successfully fetched data for network_all]
rows before: 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/src/app/delphi/epidata/acquisition/flusurv/flusurv_update.py", line 213, in <module>
    main()
  File "/usr/src/app/delphi/epidata/acquisition/flusurv/flusurv_update.py", line 204, in main
    update(fetcher, location, args.test)
  File "/usr/src/app/delphi/epidata/acquisition/flusurv/flusurv_update.py", line 155, in update
    raise Exception(
Exception: network_all 202401 data includes new group(s) {'rate_age_1t4', 'rate_age_gte75', 'rate_age_0tlt1'}

@aysim319 aysim319 self-requested a review January 16, 2025 15:41
@nmdefries
Copy link
Contributor Author

Exception: network_all 202401 data includes new group(s) {'rate_age_1t4', 'rate_age_gte75', 'rate_age_0tlt1'}

This should be resolved just by adding those signals to the list of those expected.

@nmdefries
Copy link
Contributor Author

This works now. I added a migration .sql script to add the new columns to the schema.

@nmdefries
Copy link
Contributor Author

@melange396 @aysim319 This is ready for review.

@nmdefries
Copy link
Contributor Author

I duplicated some of the work in #1287 in c88c6fc. (#1287 needs to be re-merged into this PR anyway, see comment.)

@nmdefries
Copy link
Contributor Author

Copy link

@nmdefries
Copy link
Contributor Author

Things to check before putting into prod:

  • DB migration (will need to test in staging)
  • interacting correctly with DB
  • naming new columns correctly and inserting them in the right spots
    • avoid mixed column meanings, e.g. don't change meanings of col names
  • clients work

@@ -0,0 +1,167 @@
### New API response from https://gis.cdc.gov/GRASP/Flu3/PostPhase03DataTool?appVersion=Public
Copy link
Contributor Author

@nmdefries nmdefries Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the red highlighting here seems to be because the JSON format is invalid (JSON doesn't allow comments and also maybe single quotes are disallowed). However, this file is for reference only, so it doesn't really matter.

@brookslogan
Copy link
Contributor

brookslogan commented Oct 9, 2025

Things to check before putting into prod:

  • naming new columns correctly and inserting them in the right spots
    • avoid mixed column meanings, e.g. don't change meanings of col names

I did a quick skim of the DB migration script and some of the acquisition code dealing with column names and ids. The former passed a sanity check and didn't seem like a risky operation anyway. The latter seems to have been written with a good deal of care (e.g., the changes to the overall age group column id and the new age group column ids). I don't have an environment to mock stuff but if you need a set of eyes to sanity-check the acquisition result after deployment, feel free to ping me and I can double-check API vs. upstream values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flusurv data is stale flusurv acquisition is broken
4 participants